A Fault-Tolerant Exascale Parallel Runtime

نویسندگان

  • Amos Waterland
  • Jonathan Appavoo
  • Elaine Angelino
  • Ryan Adams
  • Margo Seltzer
چکیده

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Impact of a Fault Tolerant MPI on Scalable Systems Services and Applications

Exascale targeted scientific applications must be prepared for a highly concurrent computing environment where failure will be a regular event during execution. Natural and algorithm-based fault tolerance (ABFT) techniques can often manage failures more efficiently than traditional checkpoint/restart techniques alone. Central to many petascale applications is an MPI standard that lacks support ...

متن کامل

Fault Tolerance Lessons Applied to Parallel Computing

This paper describes an approach to fault-tolerant parallel computing which is based on the experiences with the most successful fault-tolerant software – the transaction processing systems. The algorithms presented here have less runtime overhead and faster recovery than most preceding approaches. In the Pact parallel programming environment fault tolerance is provided fully user transparent i...

متن کامل

Multiscale computing in the exascale era

We expect that multiscale simulations will be one of the main high performance computing workloads in the exascale era. We propose multiscale computing patterns as a generic vehicle to realise load balanced, fault tolerant and energy aware high performance multiscale computing. Multiscale computing patterns should lead to a separation of concerns, whereby application developers can compose mult...

متن کامل

Fault-Tolerant Parallel Programming with Atomic Actions

The Pact (parallel actions) parallel programming environment provides an easy-to-use parallel execution and synchronization model based on task parallelization. To give the programmer an abstraction for global data (even on distributed memory machines) the Pact runtime system uses virtual shared memory. Execution’s efficiency is improved with data-dependent dynamic load balancing and latency-ma...

متن کامل

Concurrent C: real-time programming and fault tolerance

Concurrent C is an upward-compatible parallel extension of C which runs on a variety of uniprocessors and multiprocessors. A Concurrent C program consists of a set of processes which execute in parallel and interact with each other by sending messages. Fault-Tolerant (FT) Concurrent C, an extension of Concurrent C, is a tool for writing fault-tolerant distributed programs, based on the replicat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012